24 research outputs found

    Automating the usage of unambiguous noes in nuclear vector replacement for NMR protein structure-based assignments

    Get PDF
    Proteins perform various functions and tasks in living organisms. The structure of a protein is essential in identifying the protein function. Therefore, determining the protein structure is of upmost importance. Nuclear Magnetic Resonance (NMR) is one of the experimental methods used to determine the protein structure. The key bottleneck in NMR protein structure determination is assigning NMR peaks to corresponding nuclei, which is known as the assignment problem. This assignment process is manually performed in many laboratories. In this thesis, we have developed methodologies and software to automate this process. The Structure Based Assignment (SBA) is an approach to solve this computationally challenging problem by using prior information about the protein that is obtained from a template structure. NVR-BIP is an approach that uses the Nuclear Vector Replacement (NVR) framework to model SBA as a binary integer programming problem. NVR-TS is a tabu search algorithm equipped with a guided perturbation mechanism to handle the proteins with larger residue numbers. NVR-ACO is an ant colony optimization approach that is inspired by the behavior of living ants to minimize peak-nuclei matching cost. One of the input data utilized in these approaches is the Nuclear Overhauser Effect (NOE) data. NOE is an interaction observed between two protons if the protons are located close in space. These protons could be amide protons (HN), protons attached to the alpha-carbon atom in the backbone of the protein (HA), or side chain protons. NVR only uses backbone protons. In the previous approaches using the NVR framework, the proton type was not distinguished in the NOEs and only the HN coordinates were used to incorporate the NOEs into the computation. In this thesis, we fix this problem and use both the HA and HN coordinates and the corresponding distances in our computations. In addition, in the previous studies within this context the distance threshold value for the NOEs was manually tuned for different proteins. However, this limits the application of the methodology for novel proteins. In this thesis we set the threshold value in a standard manner for all proteins by extracting the NOE upper bound distances from the data. Furthermore, for Maltose Binding Protein (MBP), we extract the NOE upper bound distances from the NMR peak intensity values directly and test this protein on real NMR data. We tested our approach on NVR-ACO's data set and compared our new approaches with NVR-BIP, NVR-TS, and NVR-ACO. The experimental results show that the proposed approach improves the assignment accuracies significantly. In particular, we achieved 100% assignment accuracy on EIN and 80% assignment accuracy on MBP proteins as compared to 83% and 73% accuracies, respectively, obtained in the previous approaches

    PCSF: An R-package for network-based interpretation of high-throughput data

    Get PDF
    With the recent technological developments a vast amount of high-throughput data has been profiled to understand the mechanism of complex diseases. The current bioinformatics challenge is to interpret the data and underlying biology, where efficient algorithms for analyzing heterogeneous high-throughput data using biological networks are becoming increasingly valuable. In this paper, we propose a software package based on the Prize-collecting Steiner Forest graph optimization approach. The PCSF package performs fast and user-friendly network analysis of high-throughput data by mapping the data onto a biological networks such as protein-protein interaction, gene-gene interaction or any other correlation or coexpression based networks. Using the interaction networks as a template, it determines high-confidence subnetworks relevant to the data, which potentially leads to predictions of functional units. It also interactively visualizes the resulting subnetwork with functional enrichment analysis

    A matheuristic algorithm for the Prize-collecting Steiner Tree Problem

    No full text
    The Prize-collecting Steiner Tree Problem (PCSTP) is a well studied problem in combinatorial optimization. It has a wide range of applications in the literature, for instance in fiber optics such as gas distribution and district heating. In this study, we focus on its application in functional analysis of genes on bio-genetic graphs. In bio-genetics its extremely possible to have a huge graphs to interpret. Since the PCSTP is NP-hard, it is time consuming to obtain solutions for large instances. Thus, there is a need for efficient and fast heuristic algorithms to discover the hidden knowledge behind the vast bio-genetic networks. We propose a matheuristic composed of heuristic clustering algorithm and existing mixed integer liner programming to solve PCSTP. We evaluated the performance of our matheuristic on available real-world benchmark instances from the biology and compared it with existing heuristic approach in the literature. With respect to heuristic results, we obtained solutions with similar or better objective function values. On the other hand the existing heuristic solved the benchmark instances with smaller running time compared to proposed matheuristic

    A divide and conquer matheuristic algorithm for the Prize-collecting Steiner Tree Problem

    No full text
    The Prize-collecting Steiner Tree Problem (PCSTP) is a well-known problem in graph theory and combinatorial optimization. It has been successfully applied to solve real problems such as fiber-optic and gas distribution networks design. In this work, we concentrate on its application in biology to perform a functional analysis of genes. It is common to analyze large networks in genomics to infer a hidden knowledge. Due to the NP-hard characteristics of the PCSTP, it is computationally costly, if possible, to achieve exact solutions for such huge instances. Therefore, there is a need for fast and efficient matheuristic algorithms to explore and understand the concealed information in huge biological graphs. In this study, we propose a matheuristic method based on clustering algorithm. The main target of the method is to scale up the applicability of the currently available exact methods to large graph instances, without loosing too much on solution quality. The proposed matheuristic method is composed of a preprocessing procedures, a heuristic clustering algorithm and an exact solver for the PCSTP, applied on sub-graphs. We examine the performance of the proposed method on real-world benchmark instances from biology, and compare its results with those of the exact solver alone, without the heuristic clustering. We obtain solutions in shorter execution time and with negligible optimality gaps. This enables analyzing very large biological networks with the currently available exact solvers

    A Comparison of Heuristic Methods for the Prize-Collecting Steiner Tree Problem and Their Application in Genomics

    No full text
    The prize-collecting Steiner tree (PCST) problem is a broadly studied problem in combinatorial optimization. It has been used to model several real world problems related to utility networks. More recently, researchers have started using PCSTs to study biological networks. Biological networks are typically very large in size. This can create a considerable challenge for the available PCST solving methods. Taking this fact into account, we have developed methods for the PCST that efficiently scale up to large biological network instances. Namely, we have devised a heuristic method based on the Minimum Spanning Tree and a matheuristic method composed of a heuristic clustering phase and a solution phase. In this work, we provide a performance comparison for these methods by testing them on large gene interaction networks. Experimental results are reported for the methods, including running times and objective values of the solutions

    A drug similarity network for understanding drug mechanism of action

    No full text
    Chemogenomic experiments, where genetic and chemical perturbations are combined, provide data for discovering the relationships between genotype and phenotype. Traditionally, analysis of chemogenomic datasets has been done considering the sensitivity of the deletion strains to chemicals, and this has shed light on drug mechanism of action and detecting drug targets. Here, we computationally analyzed a large chemogenomic dataset, which combines more than 300 chemicals with virtually all gene deletion strains in the yeast S. cerevisiae. In addition to sensitivity relation between deletion strains and chemicals, we also considered the deletion strains that are resistant to chemicals. We found a small set of genes whose deletion makes the cell resistant to many chemicals. Curiously, these genes were enriched for functions related to RNA metabolism. Our approach allowed us to generate a network of drugs and genes that are connected with resistance or sensitivity relationships. As a quality assessment, we showed that the higher order motifs found in this network are consistent with biological expectations. Finally, we constructed a biologically relevant network projection pertaining to drug similarities, and analyzed this network projection in detail. We propose this drug similarity network as a useful tool for understanding drug mechanism of action

    A prize-collecting Steiner tree application for signature selection to stratify diffuse large B-cell lymphoma subtypes

    No full text
    Background: With the explosion of high-throughput data available in biology, the bottleneck is shifted to effective data interpretation. By taking advantage of the available data, it is possible to identify the biomarkers and signatures to distinguish subtypes of a specific cancer in the context of clinical trials. This requires sophisticated methods to retrieve the information out of the data, and various algorithms have been recently devised. Results: Here, we applied the prize-collecting Steiner tree (PCST) approach to obtain a gene expression signature for the classification of diffuse large B-cell lymphoma (DLBCL). The PCST is a network-based approach to capture new insights about genomic data by incorporating an interaction network landscape. Moreover, we adopted the ElasticNet incorporating PCA as a classification method. We used seven public gene expression profiling datasets (three for training, and four for testing) available in the literature, and obtained 10 genes as signature. We tested these genes by employing ElasticNet, and compared the performance with the DAC algorithm as current golden standard. The performance of the PCST signature with ElasticNet outperformed the DAC in distinguishing the subtypes. In addition, the gene expression signature was able to accurately stratify DLBCL patients on survival data. Conclusions: We developed a network-based optimization technique that performs unbiased signature selection by integrating genomic data with biological networks. Our classifier trained with the obtained signature outperformed the state-of-the-art method in subtype distinction and survival data stratification in DLBCL. The proposed method is a general approach that can be applied on other classification problems

    Distinguishing the type of NOE for NMR protein structure-based assignments

    No full text
    The key bottleneck of Nuclear Magnetic Resonance (NMR) studies is to map the NMR peaks to corresponding nuclei, also known as the assignment problem. Structure Based Assignment (SBA) is an approach to solve this computationally challenging problem by using prior information about the protein obtained from a homologous structure. NVR-BIP uses the Nuclear Vector Replacement (NVR) framework to model SBA as a binary integer programming problem. NVR-TS is a tabu search (TS) based approach equipped with a guided perturbation mechanism to handle the proteins with larger amino acid numbers. One of the principal data sources used by these programs is the Nuclear Overhauser effect (NOE). NOE is an effect measured between any pair of protons closer than 6 Ȧ. In these previous approaches the proton type was not distinguished in NOEs and only HN coordinates were used to incorporate the NOEs into the computation. In this study, we fix this problem and use both HA and HN coordinates and the corresponding distances in computation. We update the problem formulation of NVR-BIP accordingly and formulate the modified NVR-BIP. We also adapt the tabu search algorithm (NVR-TS) to utilize HA and HN coordinates. Our results indicate that the new approach improves the assignment accuracies significantly
    corecore